Learning gradients via an early stopping gradient descent method
نویسندگان
چکیده
منابع مشابه
On Early Stopping in Gradient Descent Learning
In this paper, we study a family of gradient descent algorithms to approximate the regression function from Reproducing Kernel Hilbert Spaces (RKHSs), the family being characterized by a polynomial decreasing rate of step sizes (or learning rate). By solving a bias-variance trade-off we obtain an early stopping rule and some probabilistic upper bounds for the convergence of the algorithms. Thes...
متن کاملOn Early Stopping in Gradient Descent Boosting
In this paper, we study a family of gradient descent algorithms to approximate the regression function from reproducing kernel Hilbert spaces. Here early stopping plays a role of regularization, where given a finite sample and some regularity condition on the regression function, a stopping rule is given and some probabilistic upper bounds are obtained for the distance between the function iter...
متن کاملLearning ReLUs via Gradient Descent
In this paper we study the problem of learning Rectified Linear Units (ReLUs) which are functions of the form x ↦ max(0, ⟨w,x⟩) with w ∈ R denoting the weight vector. We study this problem in the high-dimensional regime where the number of observations are fewer than the dimension of the weight vector. We assume that the weight vector belongs to some closed set (convex or nonconvex) which captu...
متن کاملLearning via Gradient Descent in Sigma
Integrating a gradient-descent learning mechanism at the core of the graphical models upon which the Sigma cognitive architecture/system is built yields learning behaviors that span important forms of both procedural learning (e.g., action and reinforcement learning) and declarative learning (e.g., supervised and unsupervised concept formation), plus several additional forms of learning (e.g., ...
متن کاملVAE Learning via Stein Variational Gradient Descent
A new method for learning variational autoencoders (VAEs) is developed, based on Stein variational gradient descent. A key advantage of this approach is that one need not make parametric assumptions about the form of the encoder distribution. Performance is further enhanced by integrating the proposed encoder with importance sampling. Excellent performance is demonstrated across multiple unsupe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Approximation Theory
سال: 2010
ISSN: 0021-9045
DOI: 10.1016/j.jat.2010.05.004